In [1]:
import networkx as nx
import matplotlib.pyplot as plt

%matplotlib inline

Cliques, Triangles and Squares

Let's pose a problem: If A knows B and B knows C, would it be probable that A knows C as well? In a graph involving just these three individuals, it may look as such:


In [2]:
G = nx.Graph()
G.add_nodes_from(['a', 'b', 'c'])
G.add_edges_from([('a','b'), ('b', 'c')])
nx.draw(G, with_labels=True)


Let's think of another problem: If A knows B, B knows C, C knows D and D knows A, is it likely that A knows C and B knows D? How would this look like?


In [3]:
G.add_node('d')
G.add_edge('c', 'd')
G.add_edge('d', 'a')
nx.draw(G, with_labels=True)


The set of relationships involving A, B and C, if closed, involves a triangle in the graph. The set of relationships that also include D form a square.

You may have observed that social networks (LinkedIn, Facebook, Twitter etc.) have friend recommendation systems. How exactly do they work? Apart from analyzing other variables, closing triangles is one of the core ideas behind the system. A knows B and B knows C, then A probably knows C as well.

If all of the triangles in the two small-scale networks were closed, then the graph would have represented cliques, in which everybody within that subgraph knows one another.

In this section, we will attempt to answer the following questions:

  1. Can we identify cliques?
  2. Can we identify potential cliques that aren't captured by the network?
  3. Can we model the probability that two unconnected individuals know one another?

As usual, let's start by loading the synthetic network.


In [4]:
# Load the network.
G = nx.read_gpickle('Synthetic Social Network.pkl')
nx.draw(G, with_labels=True)


Cliques

In a social network, cliques are groups of people in which everybody knows everybody. Triangles are a simple example of cliques. Let's try implementing a simple algorithm that finds out whether a node is present in a triangle or not.

The core idea is that if a node is present in a triangle, then its neighbors' neighbors' neighbors should include itself.


In [5]:
# Example code that shouldn't be too hard to follow.
def in_triangle(G, node):
    neighbors1 = G.neighbors(node)
    neighbors2 = []
    for n in neighbors1:
        neighbors = G.neighbors(n)
        if node in neighbors2:
            neighbors2.remove(node)
        neighbors2.extend(G.neighbors(n))
    
    neighbors3 = []
    for n in neighbors2:
        neighbors = G.neighbors(n)
        neighbors3.extend(G.neighbors(n))
        
    if node in neighbors3:
        return True
    else:
        return False
    
in_triangle(G, 3)


Out[5]:
True

In reality, NetworkX already has a function that counts the number of triangles that any given node is involved in. This is probably more useful than knowing whether a node is present in a triangle or not, but the above code was simply for practice.


In [6]:
nx.triangles(G, 3)


Out[6]:
3

Exercise

Can you write a function that takes in one node and its associated graph as an input, and returns a list or set of itself + all other nodes that it is in a triangle relationship with?

Hint: The neighbor of my neighbor should also be my neighbor, then the three of us are in a triangle relationship.

Hint: Python Sets may be of great use for this problem. https://docs.python.org/2/library/stdtypes.html#set

Verify your answer by drawing out the subgraph composed of those nodes.


In [7]:
# Possible answer
def get_triangles(G, node):
    neighbors = set(G.neighbors(node))
    triangle_nodes = set()
    """
    Fill in the rest of the code below.
    """
    for n in neighbors:
        neighbors2 = set(G.neighbors(n))
        neighbors.remove(n)
        neighbors2.remove(node)
        triangle_nodes.update(neighbors2.intersection(neighbors))
        neighbors.add(n)
    triangle_nodes.add(node)
    return triangle_nodes

# Verify your answer with the following funciton call. Should return:
# {1, 2, 3, 6, 23}
get_triangles(G, 3)


Out[7]:
{1, 2, 3, 6, 23}

In [8]:
# Then, draw out those nodes.
nx.draw(G.subgraph(get_triangles(G, 3)), with_labels=True)



In [9]:
neighbors3 = G.neighbors(3)
neighbors3.append(3)
nx.draw(G.subgraph(neighbors3), with_labels=True)


Friend Recommendation: Open Triangles

Now that we have some code that identifies closed triangles, we might want to see if we can do some friend recommendations by looking for open triangles.

Open triangles are like those that we described earlier on - A knows B and B knows C, but C's relationship with A isn't captured in the graph.

Exercise

Can you write a function that identifies, for a given node, the other two nodes that it is involved with in an open triangle, if there is one?

Hint: You may still want to stick with set operations. Suppose we have the A-B-C triangle. If there are neighbors of C that are also neighbors of B, then those neighbors are in a triangle with B and C; consequently, if there are nodes for which C's neighbors do not overlap with B's neighbors, then those nodes are in an open triangle. The final implementation should include some conditions, and probably won't be as simple as described above.


In [10]:
# Possible Answer, credit Justin Zabilansky (MIT) for help on this.
def get_open_triangles(G, node):
    """
    There are many ways to represent this. One may choose to represent only the nodes involved 
    in an open triangle; this is not the approach taken here.
    
    Rather, we have a code that explicitly enumrates every open triangle present.
    """
    open_triangle_nodes = []
    neighbors = set(G.neighbors(node))
    
    for n in neighbors:
        neighbors2 = set(G.neighbors(n))
        neighbors2.remove(node)
        
        overlaps = set()
        for n2 in neighbors2:
            if n2 in neighbors:
                overlaps.add(n2)
                
        difference = neighbors.difference(overlaps)
        difference.remove(n)
        
        for n2 in difference:
            if set([node, n, n2]) not in open_triangle_nodes:
                open_triangle_nodes.append(set([node, n, n2]))
    return open_triangle_nodes
# # Uncomment the following code if you want to draw out each of the triplets.
# nodes = get_open_triangles(G, 2)
# for i, triplet in enumerate(nodes):
#     fig = plt.figure(i)
#     nx.draw(G.subgraph(triplet), with_labels=True)
print(get_open_triangles(G, 3))
len(get_open_triangles(G, 3))


[{8, 1, 3}, {8, 2, 3}, {8, 3, 6}, {8, 3, 23}, {1, 3, 6}, {1, 3, 23}, {2, 3, 23}]
Out[10]:
7

If you remember the previous section on hubs and paths, you will note that node 19 was involved in a lot of open triangles.

Triangle closure is also the core idea behind social networks' friend recommendation systems; of course, it's definitely more complicated than what we've implemented here.


In [ ]: